Non-metric multi-dimensional scaling for distance-based privacy-preserving data mining
نویسنده
چکیده
Faculty of Science School of Computing Sciences Doctor of Philosophy by Khaled S. Alotaibi Recent advances in the field of data mining have led to major concerns about privacy. Sharing data with external parties for analysis puts private information at risk. The original data are often perturbed before external release to protect private information. However, data perturbation can decrease the utility of the output. A good perturbation technique requires balance between privacy and utility. This study proposes a new method for data perturbation in the context of distance-based data mining. We propose the use of non-metric multi-dimensional scaling (MDS) as a suitable technique to perturb data that are intended for distance-based data mining. The basic premise of this approach is to transform the original data into a lower dimensional space and generate new data that protect private details while maintaining good utility for distance-based data mining analysis. We investigate the extent the perturbed data are able to preserve useful statistics for distance-based analysis and to provide protection against malicious attacks. We demonstrate that our method provides an adequate alternative to data randomisation approaches and other dimensionality reduction approaches. Testing is conducted on a wide range of benchmarked datasets and against some existing perturbation methods. The results confirm that our method has very good overall performance, is competitive with other techniques, and produces clustering and classification results at least as good, and in some cases better, than the results obtained from the original data.
منابع مشابه
Privacy-Preserving SVM Classification using Non-metric MDS
Privacy concerns are a critical issue in outsourcing data mining projects. Data owners are often unwilling to release their private data for analysis, as this may lead to data disclosure. One possible solution to address such concerns is to perturb the original data values so that they become hidden, thereby preserving privacy. This paper proposes a privacy-preserving technique using Non-metric...
متن کاملA Condensation Approach to Privacy Preserving Data Mining
In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. In this paper, we propose a new framework for privacy preserving data mining of multi-dimensional ...
متن کاملEnhanced Batch Generation based Multilevel Trust Privacy Preserving in Data Mining
The motivation of Privacy Preserving Data Mining (PPDM) is to obtain valid data mining results without access to the original sensitive information. The different privacy preserving technique on Perturbation based PPDM approach introduces random perturbation to individual values to preserve privacy before data are published. This proposed work is based on perturbation based privacy preserving d...
متن کاملDistance Based Clustering of Association Rules
Association rule mining is one of the most important procedures in data mining. In industry applications, often more than 10,000 rules are discovered. To allow manual insepection and support knowledge discovery the number of rules has to be reduced significantly by techniques such as pruning or grouping. In this paper, we present a new normalized distance metric to group association rules. Base...
متن کاملAdditive Gaussian Noise Based Data Perturbation in Multi-level Trust Privacy Preserving Data Mining
Data perturbation is one of the most popular models used in privacy preserving data mining. It is specially convenient for applications where the data owners need to export/publish the privacy-sensitive data. This work proposes that an Additive Perturbation based Privacy Preserving Data Mining (PPDM) to deal with the problem of increasing accurate models about all data without knowing exact det...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014